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ABSTRACT 


The  paper  describes  the  underlying  theoretical  framework 
and  operational  details  of  two  programs,  ESEL  and  AQ11,  for  computer 
induction  within  the  framework  of  the  variable-valued  logic  system 
VL^  (i.e.,  a  statement  calculus  which  involves  variables  with  an 
arbitrary  number  of  discrete  values  [Michalski  1974])! 

ESEL  -  A  supporting  program  for  selecting  'most  represen- 
tative' learning  and/or  testing  VL^  events  from 
a  large  data  base  of  events.   The  program  provides 
the  input  to  the  program  AQllt 

AQ11  -  A  program  for  incremental  generation  of  VL^  hypo- 
theses, which  are  generalized  and  optimized  des- 
criptions of  input  event  sets.   The  program  also 
provides  a  facility  for  evaluating  the  performance 
of  these  inferred  hypotheses  on  testing  events. 

Given  a  large  set  of  examples  describing  certain  objects 
or  situations,  program  ESEL  selects  from  them  a  small  subset  of 
the  most  representative  ones.   The  examples  have  to  be  in  the 
form  of  VLi  events,  i.e.,  in  the  form  sequences  of  values  of  cer- 
tain discrete  variables  (or  descriptors).   In  selecting  the  events, 
the  program  distinguishes  among  three  types  of  descriptors:   nominal 
descriptors,  whose  value  set  is  an  unordered  set,  linear  descrip- 
tors, whose  value  set  is  a  linearly  ordered  set,  and  structured 
descriptors,  whose  value  set  is  a  tree-ordered  set. 

Events  selected  by  ESEL  are  Input  to  program  AQ11, 
which  generates  VL^  hypotheses  describing  the  events.   The  program 
can  work  incrementally,  i.e.,  given  a  working  hypothesis  (a  set  of 
rules)  obtained  at  some  stage,  and  a  set  of  events,  the  program  can 
modify  the  hypothesis  to  make  it  consistent  with  the  events. 

Program  AQ11  also  has  the  facility  to  test  the  performance 
of  a  given  hypothesis  on  a  set  of  testing  events,  and  to  compute 
an  extended  confusion  matrix. 


1.   SELECTION  OF  THE  MOST  REPRESENTATIVE  TRAINING  EXAMPLES:   Program  ESEL 

1.1  Basic  Concepts  and  Notation 

The  purpose  of  program  ESEL  is  to  select  a  subset  of 
most  representative  events  from  a  large  number  of  VL  events 
(see  definition  below) .   The  need  for  using  this  program  arises 
when  a  given  training  set  of  events  is  very  large  (say,  a  few  hundred 
or  more  events)  and  AQVAL/1  inductive  programs  (program  AQ11,  des- 
cribed here,  also  AQ7  [Larson,  Michalski  75],  Uniclass-RS  [Stepp  76], 
AQ9  [Cuneo  75],  AQPLUS  [Forsburg  75],  SYM-4  [Jensen  75],  YAL  [Yalow  77]) 
could  not  accept  such  a  large  number  of  events  Or  would  run  very 
inefficiently. 

The  theoretical  background  for  ESEL  is  given  in  [Michalski 

75].   Here,  for  completeness,  we  will  summarize  it,  and  then  describe 

the  program  itself. 

Let  £(d, ,  dOJ  ...,  d  )  or,  briefly,  &,  denote  a  set  of 
1   2        n 

all  n-tuples  (x ' ,  x'   ...,  x'),  x!  £  D . ,  1*1.2,...,  n,  where  D,  are 
12       nil  i 

certain  finite  sets  and  d   is  the  cardinality  of  D  .   Thus: 

£(d.  ,  ...,  d  )  =  D.  x  D0  x  ...  x  D  (1) 

1        n     1    2         n 

&   is  called  the  universe  of  events,  and  its  elements  are  called  events,  x! 
-l 

and  D  ,  i=  1,2,3...,  denote  a  value  and  the  domain  (value  set)  of  the 
descriptor  x  ,  respectively.   Descriptors*  are  certain  direct  (or  derived) 
measurements  or  characteristics  of  objects  or  situations. 


* 
A  descriptor,  as  described  here,  is  equivalent  to  a  variable.   (In  a  more 
general  sense,  not  considered  here,  a  descriptor  can  also  be  an  n^-ary 
relation  or  n-argument  function.) 


Depending  on  the  nature  of  a  descriptor,  its  domain 
may  have  a  different  structure,  e.g.,  it  can  be  a  linearly  ordered  set,  a 
partially  ordered  set,  or  an  unordered  set. 

Three  categories  of  descriptors  are  distinguished  here: 

I.  Nominal  or  cartesian  descriptors  whose  domains  are  sets  that 
have  no  order. 

II.  Linear  (interval)   descriptors  whose  domains  are  any  linearly 
ordered  sets.   Thus,  this  category  includes  ordinal,  interval,  ratio  and  absolute 
variables,  as  defined  in  mathematical  psychology. 

III.  Structured  descriptors  whose  domains  are  partially  ordered 
sets  <  S,  >_  >  that  are  neither  linearly  ordered  nor  totally  unordered.   In 
this  paper  we  will  restrict  ourselves  to  the  case  of  partially  ordered  sets 
having  the  property  that  for  any  two  elements  a,  b  E  S,  there  exists  at  least 
one  element  o   such  that 

a  ^_  a   and  b  £_  Q 
Sets  with  such  structures  will  be  called  generalization  structures  or  g-struc- 
tures.   Figure  1  presents  a  Hasse  diagram  of  a  g-structure. 


An  example  of  g-structure 
Figure  1 


*  The  term  used  in  our  previous  papers  on  this  subject, 


In  the  diagram,  a  relation  a   _>  b   is  represented  by  placing  node  a   above 

node  b   and  linking  the  nodes  by  an  arc. 

Examples  of  descriptors:   the  blood  type  of  a  person  is  a 
nominal  descriptor,  the  height  or  weight  of  a  person  is  a 
linear  descriptor,  and  the  position  of  the  person  in  an 
hierarchy  of  an  institution  is  a  structured  descriptor. 

Suppose,  without  loss  of  generality,  that  we  are  given  two 
event  sets,  El  and  EO,  where  El,  EO  C  gy   each  associated  with  a  certain 
decision  or  action  k  (k  =  1  and  0,  respectively).   These  sets  define  a  set  of 
functions 

{f:   £  +  D}  (2) 

such  that 

{e  |  f(e)  =  k}  =  Ek,  k  =  1,0  (3) 

where  e  £  &  and  D  =  {0,1,*};   '*'  in  D  means  'no  decision'. 

A  problem  of  inductive  inference  is  to  determine  an  expression  V 
of  a  function  f,  which  is  most  desirable,  with  respect  to  some  criterion,  among 
all  the  expressions  of  all  functions  (2).   Such  an  expression  will  usually 
also  assign  values  1  or  0  to  events  not  included  in  Ek;  i.e.,  the  expression 
will  be  a  certain  generalization  of  the  sets  Ek.   Namely,  the  initial  set  Ek 
will  be  transferred  into  sets  Ek(V)   3  Ek,  where 

Ek(V)  =  {e  |  V(e)  =  k} ,  k  =  1,0 
V(e)  -  the  value  of  the  expression  V  for  the  event  e. 

AQVAL/1  programs  (Michalski  77)  can  be  used  to  solve  the  problem 
if  the  expression  V  is  restricted  to  the  class  of  DVL1  expressions  and  the  sizes 
of  the  sets  Ek  do   not  exceed  certain  limits.  When  sets  Ek  are  very 
large   (say,  a  few  hundred  elements  or  more) ,  then  the  computational  time  of 
the  programs  may  be  too  long.   The  problem  arises  as  to  whether  sets  Ek  could 
not  be  reduced  to  more  manageable  sizes  and  still  provide  sufficient  information 
about  decision  classes  from  the  viewpoint  of  inductive  inference. 

If  a  precise  measure  of  a  "degree  of  representativeness"  of  each 
event  e  £  Ek  were  available,  then  an  event  reduction  process  could  be  performed 


simply  by  selecting  events  whose  'degree  of  representativeness'  is  above  a 
certain  threshold.  For  example,  the  frequency  of  occurrence  of  an  object  with  the 
description  e  in  the  class  k  could  serve  as  an  estimate  of  such  a  measure. 
This  estimate,  however,  in  many  practical  problems  is  either  not  available  or 
is  not  adequate.   Consequently,  some  other  means  must  be  developed  for  selecting 
the  'most  representative'  events. 

There  can  be  a  number  of  different  methods  of  solving  this  problem 
(see,  e.g.,  [Michalski  75]).   Program  ESEL  implements  a  method  called  'outstanding 
representatives'  (OR). 
1.2  An  Outline  of  the  OR  Method 

In  this  method,  the  original  event  set  is  reduced  to  a  set  consisting 
of  events  which  are  most  'distant'  from  each  other.   An  important  feature  of 
this  method  is  that  the  resulting  set  will  include  events  which  delineate  the 
'outside'  of  the  events  in  the  original  set.   For  example,  if  the  'true'  but 
unknown  decision  class  is  a  circle  and  its  interior  and  the  original  event 
set  consists  of  a  number  of  randomly  selected  points  from  this  class,  then 
the  reduced  set  will  be  a  set  of  points  lying  on  or  close  to  the  perimeter  of 
the  circle  and  spanning  a  polygon  of  approximately  equal  sides. 

This  method  is,  however,  very  sensitive  to  events  which  differ 
significantly  from  the  rest  of  the  events  in  the  original  set.   If  such  events 
happened  to  be  errors,  then  these  errors  would  have  a  strong  effect  on  the 
result.   To  circumvent  this  problem,  an  additional  test  could  be  done,  which 
selects  an  event  only  if  it  has  a  certain  number  of  'close'  neighbors*.  Figure 
1  illustrates  this  method. 


*This  feature  is  not  implemented, 


Let  e  and  e«  denote  two  given  events: 


el    (xl'  x2'*-->  xnl'    Xnl+1'  Xnl+2'••,,  Xn2'    Xn2+1'  Xn2+2'"*'  Xn* 

p   =  fx"   x"       x"       x"      x"  x"      x"      x"  x"} 

e2    ^   V    x2'-'-'   nl'     nl+1'  Xnl+2'"*'   n2'    n2+l»   n2+2'    '  V 


V _ J         V ^    V 


linear  structured  nominal 

variables  variables  variables 


where  x!  and  x'.'  denote  values  of  variable  x.  in  e,  and  e„,  respectively.   Assume, 
11  i     1      2 

without  loss  of  generality,  that  the  first  nl  variables  in  the  events  above  are 

interval  variables,  the  following  n2  variables  are  structured  variables  and 

those  remaining  are  nominal   variablest. 

First,  we  will  define  a  measure  of  the  distance  d  (x ' ,  xV)  between 

the  values  of  a  variable  depending  on  the  type  of  the  variable  : 

«         For  linear  variables: 

|x'  -  x'.'l 
d(x'.,  xV)  =   *    1    ,  1  <  i  <  nl  (4) 

1    1         a.x  —    — 

assuming   that   the  domain  of   each  linear  variable   is   represented 


the 


by  the  set  {0,  1,  2,...,  X.},\  ■  \-l      (di    ~   the  cardinality  of  D  ,  i.e. of 

domain  of  x . ) 
«         For  structured  variables: 

d(x.   X")  =  _NB_ 

Q^Xi'  V    mnb  (5) 

nl  <  i  <_   n2  (see  Figure  2) 

where  NB  is  the  number  of  branches  on  the  shortest  path  linking  x'  with  xV 

in  the  Hasse  diagram  representing  the  domain  of  x.,  and  MNB  is  the  maximum  number 

of  branches  on  the  shortest  path  linking  any  two  nodes  of  the  diagram. 

•  For  nominal  variables: 

1,  if  x'  is  not  identical  to  x'' 
d(x\  x")  = 


ri,   if  x^ 

1^0,  other* 


lerwise 
(n2  <  i  £  n) 
Two  types  of  distance  measures  between  events  are  considered: 


tit  is  assumed  here  that  if  the  domain  of  a  structured  variable  is  not  a  g- 
structure,  then  the  variable  is  treated  as  a  cartesian  variable. 


NB(a,b)  =  3 
NB(b,c)  =  6 
NB(a,c)  =  7 


MNB  =  9 


d(a,b)  =  3/9 
d(b,c)  =  6/9 
d(a,c)  =  7/9 


Illustration  of  a  distance  between  values  of  a  structured  variable 

Figure  2 


(1)  Quantized  measure: 

dq(el'e2}  =  i-1  iCdCxJ.xJ).!^  +  i=^2+1  wd(x;,xp  (6) 

where  T,  =  (t._,  t   ,...,  t.  )  is  a  sequence  of  thresholds  t..  associated  with 
1     11   12       ip        ^  ij 

variable  x,,  i  =  1,  2,  ...,  n2 

q  is  a  quantization  function  q:{d(x'   x"XT}-»-{0,  1,  2,...,  p} 
defined  as 

-0,  if  d(xj,  xj)  £  tn 


1,  if  t.1<d(x',  xp  <  ti2 


q(d(x^,  xp,T)  =  /  ' 


p,  if  t.   <  d(x  ,  x  ) 
^        ip      i   i 

w  -  a  'weight'  assigned  to  nominal  variables  in  relation  to  non- 
nominal  variables. 

(2)  Continuous  measure 

n 
dc(V  e2)  =     1-1  V(xi'  Xl)  (7) 

where  w.  is  a  weight  associated  with  the  variable  x,. 
l  i 

The  threshold  sequence  T.  in  the  quantized  measure  and  weight 
w  in  the  continuous  measure  represent  two  different  means  to  control  the 
effect  of  a  single  linear  or  structured  variable  on  the  distance  between  events, 

As  we  can  see,  control  by  a  threshold  sequence  avoids  a  aultipli- 
cation  operation  in  computing  the  distance,  unlike  in  control  by  weight,  and 
thus  is  computationally  simpler  than  the  latter.   It  requires,  however,  that 
the  user  specifies  the  value  of  p  and  p  thresholds  for  each  variable,  as 
opposed  to  the  single  number  (weight)  required  in  control  by  weight. 


1. 3  Algorithm 

We  will  now  describe  a  specific  algorithm  implementing  the  OR 
method.   The  algorithm  is  applied  in  the  same  way  to  every  set  E  , 
k  =  1,  2,....   Let  us  then  assume  that  E  stands  for  any  one  of  these  sets. 
Either  of  the  distance  measures  introduced  in  section  1.2  can  be  used  in 
the  algorithm. 

1.  For  each  e  e  E  determine  the  distance  d(e,  e„),  where 
eQ  =  (0,  0,  0,...,  0). 

2.  Find  events  e  .   and  e    such  that 

mm      max 

d(em-.-T,'  er>)  =  mln  d(e>  en) 
mm    U     ^  „  \) 

etE 

d(e™oV>  O  =  max  d(e>  er) 
max   o    eeE      0 

3.  Determine  the  distance  d(e  .  ,  e   )  and  divide  it  into 

mm   max 

r  intervals*,  where  r  is  between  0.01  and  0.1  of  the 
size  c(E)  of  the  original  set  E  (e.g.,  if  c(E)  =  3000 
then  r  is  between  30  and  300) . 

4.  Partition  E  into  r  subsets,  En ,  E_,...,  E  ,  such  that 

'   1'   2       r 

E.  consists  of  events  whose  distance  d(e,  e„)  lies  in 
the  ith  interval,  i  =  1,  2,...,  r: 

ai-l  <  d^e'  e0^  -  ai 

where  a.  ,  and  a.  are  the  endpoints  of  the  ith  interval 
l-l      l 

(a_.  =  d(e  .  ,  e„)  and  a  =  d(e    ,  e_.)). 
0      mm   0       r      max   0 


*  The  intervals  do  not  have  to  be  equal.   The  desired  situation  here  is  to 
have  intervals  which  will  lead  to  the  subsets  E.  (determined  in  step  4) 
of  approximately  the  same  size. 


5.      From  each   subset   E.,    1=1,    2,    ...,    4,    select  a  subset 

E       consisting  of  s   events    (where   s   is   such   that   r*s   gives 

the  desired  size  of   the  reduced  event   set) .      The 

selection  is  made  in  the  following  way: 

1.)      Find  e,    and   e„   in  E.    such   that 
1  2  l 

d(e_,    ej    =  max         d(e    ,    e   )* 

12  r  „  a       b 

e    ,  e,  e  E . 
a     b         x 

2.)      Find  e~   such   that 

d(e    ,    e   )    •   d(e_,    e^)    =    max     (dCe.e.^)    •    d(e,e2)) 

e  £E. 

l 


s-1.)      Find  e      such   that 
s 

s-1  s-1 

n  d(e    ,    e.)    =    max        II     d(e,    e^) 
j-1        S        J  e6E1    j-1  J 

where  fl  denotes  the  arithmetic  multiplication. 


6.   The  union  of  the  sets  E,  : 

is 


r 
E  =  U  E. 
S    i-1   1S 


gives  the  reduced  event  set. 


*A  more  computationally  efficient  process,  though  one  which  might  lead  to  a 
less  desirable  result,  is  to  replace  step  1  by  two  steps: 
la)   find  e^  such  that 

d(e-L,  eQ)  =  min  d(e,  eQ) 
e  eEi 

lb)   find  e2  such  that 

d(e^,  e2)  =  max  d(e,  e^) . 
ee  E 

tlhe  reason  for  using  multiplication  in  steps  2,  ...,  s-1,  is  to  select 
events  which  are  at  similar  distances  from  each  other. 


10 


The  number  of  operations  required  by  the  algorithm  is 

approximately : 

s-1 
N  =  c(E)  +  r  (cCEj)  +   22  j(t-j) 

where  c(E),  c(E  )  is  the  cardinality  of  E  and  E.,  respectively.   (E .  are 
assumed  to  be  all  of  equal  size.)   An  'operation'  may  involve  computing  the 
distance  between  two  events,  the  comparison  of  two  distances,  the  comparison 
of  the  distance  with  a  threshold,  etc.   In  the  modified  form  of  the  algorithm 
we  have: 

s-1 


N 


'  =  c(E)  +  r   .2  jc(E  ) 


For  example,  if  c(E)  =  3000,  c(E±)  =  100,  r  *  30,  s  =  10,  then  N  =  273000 
(N'  =  268000), and  the  cardinality  of  the  reduced  set  would  be  c(E  )  =  300. 

1.4  User's  Guide  for  ESEL 

INPUT  FILES 

PARMSX  -  A  file  with  information  about  variables. 

INST  -  A  file  with  information  about  the  sizes  of  event  sets 

which  are  in  the  data  base  and  the  sizes  of  representa- 
tive sets  of  events. 

EVNT  -  The  file  containing  the  data  base. 
PARMSX  FILE 

This  file  contains  the  number  of  variables  in  the  event  descriptions 
in  the  data  base,  the  range  of  values  for  each  variable,  the  domain  structure 
for  each  variable,  and  the  weight  which  should  be  given  to  each  variable.   The 
first  number  in  this  file  must  be  the  number  of  variables  in  each  event  descrip- 
tion.  The  next  three  specifications  are  all  in  the  same  form:   a  number,  optionally 
followed  by  a  set  of  numbers.  The  first  number  may  be  used  to  set  all  values  of 


11 


the  range,  structure,  or  weight  to  a  single  value.   If  all  values  in  a  specifi- 
cation are  to  be  set  to  one  value,  then  this  first  number  should  be  this  value 
and  there  is  no  following  set  of  values.   If  a  value  is  to  be  specified  for  each 
variable,  the  first  number  should  be  (0).   For  example,  suppose  there  are  3 
variables  with  the  following  situation: 


xl       x2       X3 


max.  value         3         2         4 
structure       interval  nominal   interval 
weight  5        6        3 


The  file  PARMSX  would  look  like  this: 

3   0  324  0  303  0  563 


max  value  structure   weights 

The  first  3  indicates  3  variables,  the  first  0  indicates  that  the 
range  will  be  specified  for  each  variable  (another  value  than  0  would  indicate 
that  all  variable  ranges  will  be  of  that  value) .   The  second  0  indicates  that 
the  structure  of  all  variables  will  be  individually  specified.  An  interval 
structure  is  specified  by  the  number  3,  any  other  number  gives  a  nominal  struc- 
ture.  The  final  0  indicates  that  weights  will  be  specified  independently  for 
each  variable. 

Here  is  another  example:   suppose  there  are  4  variables  in  the 
data  base  with  the  following  characteristics: 


Xl        X2        X3        X4 

max.  value        3        3        3        3 

structure       interval  interval  interval  interval 
weight            6         5         11 


Then  the  PARMSX  file  would  look  like  this: 


0   6  5  11 
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The  4  indicates  that  there  are  4  variables  describing  objects  in 
the  data  base.   The  next  two  3's  indicate  that  all  ranges  are  from  0  to  3  and 
that  all  variables  are  of  interval  structure.   The  0  indicates  that  weights  will 
be  specified  independently. 

INST  FILE 

This  file  contains  information  about  the  number  of  events  in  each 
class  in  the  data  base  and  the  number  of  events  from  each  class  the  program  is 
to  select.  Each  class  is  specified  by  two  lines  in  this  file.  The  first  line 
specifies  the  number  of  events  in  the  data  base  which  correspond  to  the  class, 
the  second  line  specifies  the  number  of  partitions  and  the  total  number  of 
events  which  are  to  be  selected  from  the  class.  A  class  with  0  events,  0  par- 
titions, and  0  selected  events  terminates  the  file. 

For  example,  a  data  base  with  100  events,  the  first  50  of  which  are 
to  be  in  the  first  class,  the  next  20  in  the  second  class  and  the  last  30  in 
the  third  may  be  specified  as  follows: 


INST: 


blank  -  the  first  line  must  be  blank 

50  -  the  first  50  events  are  in  the  first  class 

15  -  using  1  partition,  select  5  events 
20  -  the  second  class  has  20  events 

16  -  using  1  partition,  select  6  events 
30  -  the  last  class  has  30  events 

3  20  -  using  3  partitions,  select  20  events 

0 

0  0  -  the  last  class 


EVNT  FILE 

This  file  contains  the  actual  data  base.  Events  are  stored  as  lists 
of  integers.  Irrelevant  values  are  stored  as  -1.  The  first  line  of  this  file 
must  be  blank. 

For  example,  a  situation  with  5  events  and  3  variables: 
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EVNT  FILE 


blank 
0-13 

2  2  3 

3  3  1 
3  11 
0   10 


OUTPUT  FILES 

OFILEX  -  A  file  with  the  selected  events. 

TOPT   -  A  file  with  the  remaining  events  which  were  not  selected. 

Each  file  is  in  the  same  form  as  the  input  file  EVNT  except  that  a  class 
number  is  appended  to  the  beginning  of  each  event.   This  output  format  is  compatible 
with  the  VL-  mode  of  the  INDUCE-1  (Larson,  Michalski  77,  Larson  77  a,b)   program 
and  the  program  AQ11. 
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2.  INCREMENTAL  GENERATION  AND  TESTING  OF  VL  HYPOTHESES:   Program  AQ11 

2. 1  Introduction 

There  are  many  situations  when  one  starts  with  certain  initial 
hypotheses  about  given  data  and  then,  in  the  process  of  experimenting  with 
these  hypotheses,  has  to  modify  them  in  order  to  preserve  consistency  with  new 
acquired  facts.   Such  situations  arise,  e.g.,  in  rule-based  expert  systems,  where 
in  the  course  of  a  system's  performance  some  rules  are  discovered  to  be  incorrect 
or  incomplete  and  have  to  be  modified. 

A  process  of  generating  hypotheses  (or  descriptions)  in  steps,  where 
each  step  starts  with  certain  working  hypotheses  and  a  set  of  (new)  data  and 
ends  with  appropriately  modified  hypotheses, is  called  an  incremental  (or  mul t i- 
step)  generation  of  hypotheses. 

The  purpose  of  program  AQ11  is  to  implement  such  an  incremental 
generation  of  hypotheses  in  the  framework  of  the  variable-valued  logic  system 
VL-  (Michalski  74) .   Although  from  the  viewpoint  of  the  complexity  of  real 
scientific  research  this  framework  is  extremely  restricted;  nevertheless, 
it  is  still  sufficiently  rich  to  provide  an  interesting  research  subject  and, 
also,  to  obtain  solutions  which  may  have  practical  applications. 

Hypotheses  are  expressed  here  as  (constant-free)  disjunctive 
normal  VL   expressions  (DVL   expressions*).   A  DVL  expression  is  a  disjunction 
of  terms,  where  a  term  isalogical  product  of  selectors.   A  selector  is  a 

statement  in  the  form: 

[x  #  R] 

where  x  is  a  unary  descriptor  (variable) 

#  denotes  any  of  the  relational  operators  ■  £   ^  ^ 

R  is  a  list  of  constants  which  are  elements  of  the  domain  of  x   (R  is  called 

the  reference  of  the  selector) 

*In  the  general  case,  DVL^  expressions  involve  constants  and  are  multiple-valued  logi 
expressions  [Michalski  74].   Here,  for  simplicity,  we  will  assume  initially,  that 
they  are  just  binary  (i.e.,  either  satisfied  or  not  satisfied),  and  have  no  constant 
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When  a  DVL  expression  is  evaluated  for  a  given  event,  selectors  are 

interpreted  as  conditions  (or  questions) .   A  selector  is  satisfied  if  the  value  of  the 

variable  in  the  event  satisfies  the  condition,  otherwise,  it  is  not  satisfied. 

Some  examples  of  selectors  and  their  interpretation  as  conditions  follows: 

[x.  =  1]  is  x.  equal  to  1? 

l  l 

[x.  =  1,3]  is  x.  equal  to  1  or  3? 

[x.  =  1..3]         is  x.  between  1  and  3,  inclusively? 
l  l 

An  example  of  a  term: 

[Xl  =  3][x3  =  2,4,5][x5  =  0] 

The  above  term  is  satisfied  if  x1  equals  2,  x  has  value  2,  4,  or  5  and 

x_  has  value  0. 

An  example  of  DVL-  formula: 

T   V  T   V   T 
12      3 

where  T  ,  T  ,  T  are  terms.   The  formula  is  satisfied  if  term  T  or  T~ 
or  T„  is  satisfied. 

A  DVL  formula  is  interpreted  as  a  description  of  a  set  of 
events,  namely  events  which  satisfy  it. 


2.2.   Description  of  Methodology 

Suppose  there  is  given  a  set  of  hypotheses  (DVL  descriptions), 
V  =  {V.},  i=l,...,m,  and  a  family  of  event  sets  ('facts'),  F={F  },  which  these 
hypotheses  are  supposed  to  describe.   Suppose  that  for  any  i,  V.  describes  cor- 
rectly only  a  part  of  the  events  from  F  . 

The  problem  is  to  produce  a  new  set  of  hypotheses,  V  =  {V  },  where 

each  V.  describes  all  events  from  set  Ff,  and  does  not  describe  events  from 
i  i 

any  other  event  set  F.,  j  ±  i. 

The  following  solution  to  this  problem  is  based  on  the  multiple 
application  of  a  computer  program  implementing  an  efficient  algorithm  [Michalski  71] 
for  determining  a  cover,  C(E  /E  ),  of  an  event  set  E]  against  the  event  set   E  . 
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Such  a  cover  can  be  interpreted  as  a  DVL  expression,  which  is  satisfied  by  every 

event  in  E  and  not  satisfied  by  any  event  in  E   (or  in  E  \  E- ,  if  E  and  E,  intersect) 
x  o  O   1       o       1 

The  solution  consists  of  3  major  steps: 

SteP  1-   The  first  step  isolates  those  facts  which  are  not  consistent 

with  the  given  hypotheses.  For  each  hypothesis,  two  sets 

are  created: 

F  -  a  set  of  events  which  should  be  covered  by  the  hypothesis, 

but  are  not 
F  -  a  set  of  events  which  are  covered  by  the  hypothesis,  but 

should  not  be  covered. 
(An  event  is  said  to  be  covered  by  a  hypothesis  if  the  event 
satisfies  the  VL.  formula  which  represents  the  hypothesis.) 
Specifically,  this  step  determines,  for  each  i,  1=1,2, ... ,m,  the  sets*: 

F+  =  F.  \  v\  (8) 

ill 

F"  =  V.  n  F  ,  j=l,2,...,m;  tfi  (9) 

(see  Figure  3) . 

Thus,  F.  denotes  events  which  should  be  covered  by  V  but  are  not,  and 

F..  denotes  'exception'  events,  i.e.,  events  in  F.,  j^i,  which  are 
ij  F  '     '  i 

covered  by  V.,  but  should  not  be  covered, 
l 

Step  2.   This  step  determines,  for  each  i,  a  generalized  formula  V. 
describing  all  exception  events  (the  union  of  sets  F..,  j=l,2,...,m, 
j^i).   This  is  done  by  generating,  for  given  i  and  each  j,  a  cover  of 

F    against  the  events  in  the  sets  V,  U  F.,  i=l, 2, . . . ,m: 

3  m       + 

V..  =  C(F,  .  /  U  V.u  F.)  (10) 

ij       ij   i=l   i    i 

and  then  taking  the  logical  union  of  V 


■  v  vT  (n) 

j=i 

3H 


1  •  i  ij 


*V  denotes  the  set  of  events  covered  by  formula  V.. 
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Illustration  of  sets  ¥_,    and  F_,  . 


Figure  3 
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The  reason  for  this  step  is  that  it  is  computationally  more  efficient 

to  use  formulas  V   than  the  union  of  E..,  j=l,2,...,m;  j^i. 

Step  3.   New  'correct'  hypotheses  could  be  obtained  now  by  'subtracting' 

from  each  V.  the  formula  V.  and  'adding'  to  it  the  set  F..   To  do  this 
11  l 

directly,  however,  is  difficult.   Again,  an  advantage  is  taken  of  the 

available  computer  program  for  generating  covers  C(E. /E  ). 

1  o 

Namely,  the  new  hypotheses,  V.,  i=l,2,...,m,  are  determined  as  covers: 

v]   =  C(F./U  [(V^v")  u  Ffc])  (12) 

k=l 
k^i 

(The  point  is  that  directly  simplifying  a  union  of  terms  is  difficult; 

but  ' substracting'  a  term  from  a  term  or  generating  a  cover  of  an 

event  set  against  a  DVL..  formula  is  easier). 

Step  4.   This  step  determines  the  final  representation  of  hypotheses 

V..   The  V.  are  DVL..  expressions  which  are  unions  of  terms.   Some  terms 
i        l        1 

in  a  V.  may  represent  (cover)  only  a  few  events  in  F . .   Such  'low 
weight'  terms  are  replaced  by  the  events  (facts)  themselves  (since  an 
event  takes  less  memory  than  a  term).   In  program  AQ11,  parameter 
PUNY  specifies  the  minimum  percent  of  events  which  a  term  has  to  cover 
to  be  a  'high  weight'  term. 

For  example,  if  PUNY  =  0.02,  and  a  set  F.  has  100  events,  then  all 
terms  which  cover  3  or  more  events  (3  >  0.02  x  100)  are  'high  weight' 
terms.   Terms  which  cover  1  or  2  events  are  replaced  with  those  events. 

2. 3  An  alternative  way  of  handling  exception  events 

In  the  procedure  above,  the  exception  events  were  represented  by  terms 
in  V . .   If  the  number  of  exception  events  is  small,  it  can  be  easier  to  handle  the 
events  without  turning  them  into  expressions  V..   The  'substraction'  (denoted  by  \) 
of  an  event  e  from  a  term  T  (in  a  given  formula)  is  done  by  logically  multiplying 
the  term  by  the  negation  of  the  event: 
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T\i  e  =  T  a  e  (13) 

In  order  to  use  this  way  of  handling  exception  events,  in  program 
AQ11  the  parameter  STGY  should  be  set  to  value  2  (STGY=2) . 

The  result  of  operation  (13)  can  produce  several  terms.   Anyone  of  them 
is  sufficient  to  be  used  in  the  new  hypothesis.   In  program  AQ11,  there  is  a  para- 
meter //EX  which  specifies  how  many  such  terms  a  user  wants  to  store  for  representing 
a  hypothesis.   If  the  number  of  generated  terms  is  larger  than  //EX,  the  program 
Selects  //EX  'best'  terms  according  to  the  criteria  list. 

2.4  Additional  Features 

There  may  exist  certain  restrictions  on  the  event  space  which  must 
hold  in  the  resulting  formulas.   A  restriction  may  be  of  the  form 

[x  =2]  ->  [x  =NA]    (NA  =  not  applicable) 

which  is  read  "if  x  has  the  value  2  then  the  variable  x  is  not  applicable." 

The  implementation  of  these  restrictions  can  be  viewed  as  an  extra  set  of 

hypotheses  V  ,,  which  is  included  in  the  set  E  of  all  covers: 
n+1 

C(F./E°UV  L,)  ' 
l      n+1 

Due  to  the  techniques  used  in  the  covering  algorithm  (namely,  the  use  of  para- 
meter 'maxstar',  see  p.  27),  this  may  not  be  the  best  approach  since  only  a 
few  terms  in  each  intermediate  quantity  are  retained.   Therefore,  the  program 
imposes  these  restrictions  on  all  facts  in  the  set  F={F.}.   Using  the  above 
restrictions,  an  event 

e  =  (1  3  2) 
is  replaced  with 

e  =  (NA  3  2) 
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2. 5  Testing  Procedure 

By  applying  the  above  described  part  of  AQll  program  one  can  determine 
DVL  descriptions  (hypotheses)  of  classes  of  objects  from  examples  of  objects 
representing  individual  classes.   An  obvious  problem  arises  of  testing  the  validity 
of  the  derived  descriptions.   This  is  done  by  applying  the  descriptions  to  new 
examples  of  objects  with  known  class  membership.   The  results  of  such  testing  are 
usually  represented  in  a  form  of  a  confusion  matrix.   This  matrix  specifies  for 
each  class  (  a  row  in  the  matrix),  the  numbers  of  testing  objects  of  this  class, 
which  were  assigned  by  the  descriptions  to  individual  classes  (corresponding  to 
columns  of  the  matrix) . 

Below  is  an  example  of  a  confusion  matrix  involving  2  classes:   a 
class  of  cancer  cells,  and  a  class  of  non-cancer  cells: 


Class 
(Correct  Assignment) 

Assigned  Decision 

Cancer  cells 
Non-cancer  cells 

Cancer  cells 

Non-cancer  cells 

28 
7 

2 
23 

Entries  on  the  diagonal  indicate  the  correct  decisions,  entries  outside  of  the 
diagonal  -  incorrect  decisions.   For  example  the  number  7  in  the  second  row 
indicates  that  7  (testing)  non-cancer  cells  were  classified  incorrectly  as  cancer  eel] 

This  form  of  confusion  matrix  is  adequate  if  an  event  (object)  either 
satisfies  or  does  not  satisfy  a  formula.   In  general,  however,  it  is  desirable  to 
consider  the  degree  to  which  a  given  event  e  satisfies  or  matches  a  formula.   Such 
a  degree,  called  degree  of  consonance  (or  degree  of  match)  and  denoted  DC(e,V), 
is  computed  according  to  an  evaluation  scheme.   An  evaluation  scheme  consists 
of  definitions  for  computing: 

(1)  DC(S,e)  -  a  degree  of  consonance  between  a  selector  and  an  event  (briefly, 
degree  of  consonance  of  a_  selector)  , 

(2)  DC(T,e)  -  a  degree  of  consonance  of  a  term  (a  product  of  selectors), 

(3)  DC(V,e)  -  a  degree  of  consonance  of  a  DVL   formula  (union  of  terms), 


21 

DC({V  },  e)  -  a  degree  of  consonance  of  a  set  of  formulas  (describing  the 

same  class) . 

Many  different  evaluations  schemes  can  be  applied  for  evaluating 
DVL-,  formulas.   Methods  developed  in  many-valued  logic  (e.g.,  Recher  69)  and 
fuzzy  reasoning  (e.g.,  Zadeh  74,  Gaines  76)  are  applicable  here.   We  will 
describe  the  evaluation  scheme  currently  implemented  in  program  AQ11, 
and  give  suggestions  for  other  evaluation  schemes. 

(1)  Definition  of  degree  of  consonance  of  a  selector. 

The  basic  definition  of  the  degree  of  consonance,  DC(S,e),  of  a 

selector  comes  from  the  evaluation  rules  in  VL   [Michalski  74].   Assuming 

that  the  output  domain  of  the  formulas  D  =  {0,1}  we  have: 

1,  if  the  value  of  appropriate  variable  in  e 
satisfies  the  selector  S 

DC(S,e)  =  0,  if  it  does  not  satisfy  S 

*   the  value  is  unspecified 

For  example,  suppose  event  e  =  (x  ,x„,x  )  =  (3,  1,  1),  and  selector  S  is  [x  =1,3]. 
We  have  D(e,S)  =  1,  because  value  of  x_  in  e  is  a  member  of  the  reference  of  the 
selector  (i.e.,  1  is  member  of  {1,3}).   (Fig.  4). 

Alternative  evaluation  schemes  can  take  into  consideration  the 
structure  of  the  domainof  the  variable  in  the  selector. If  a  variable  is  linear, 
it  seems  that  the  above  definition  of  DC(e,S)  is  too  rigid.   For  example,  if  a 
,linear  variable  x  =13  and  S:  [x  =14.. 18],  the  selector  is  evaluated  to  0,  while 
it  seems  desirable  to  evaluate  it  to  some  value  greater  than  0   (since  13  is  so 
'close'  to  14).   This  means  that  one  could  accept  a  'bell-shaped'  function  for 
evaluating  interval  selectors  (Fig.  5). 

The  concept  of  'trimming' a  term  can  be  also  useful  here.   In  an  untrimmed 
(extended)  term,  references  of  selectors  (sets  of  values)  are  as  large  as  possible 
without  leading  to  a  contradiction,  i.e.,  intersectionwith  formulas  of  different  classes 
In  the  trimmed  term,  references  are  as  small  as  possible,  providing  that  the  term 
jJtill  covers  the  same  learning  events  as  the  extended  term  and  preserves  the  type 
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A  graphical  illustration  for  evaluation  selector  [x  =1,3] 


Figure  4 


DCA 


0 


-if L 


T---— I 


0         12         345         6         78 


A  bell-shaped  (A)  versus  step-shaped  (B)  function  for 
evaluating  a  linear  selector  [x.=3..5] 


Figure  5 
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of  selectors, e.g.,  if  the  reference  of  a  linear  selector  lsa..b,    then  in 
the  trimmed  selector  it  will  be  an  interval  a    .  .b    ,  a<a       b   <b. 

An  evaluation  function  can  assign  DC  =  1  when  a  variable  has  value 
within  the  'trimmed*  reference,  assign  DC  =  0,  when  the  variable  has  value 
outside  of  the  extended  reference,  and  assign  DC  =  B,  0<B<1,  otherwise. 
(Fig.  6  a  and  b). 

(2)  Definition  of  the  degree  of  consonance  of  a  term. 

In  the  definition  of  VL  ,  the  degree  of  consonance  of  a  term  was  defined  as  the 
minimum  of  values  of  selectors  in  the  term.   In  AQ11,  the  degree  of  con- 
sonance of  a  term  is  computed  as  the  ratio  of  the  number  of  selectors 
satisfied  in  the  term  to  the  total  number  of  selectors  in  the  term. 

If  all  selectors  in  a  term  are  satisfied,  then  both  definitions 
give  the  same  value.   If  this  is  not  the  case,  the  latter  definition  differ- 
entiates between  the  terms  with  different  numbers  of  selectors 
satisfied,  while  the  former  does  not,  (which  is  a  desirable  feature). 

As  an  alternative,  one  could  use  here  also  a  probabilistic  logic 
evaluation,  which  evaluates  a  term  into  the  arithmetic  product  of  DC-s  of 
selectors . 

(3)  Definition  of  the  degree  of  consonance  of  a  formula. 

The  degree  of  consonance  of  a  formula  V  and  an  event  e,  DC(V,e), 
is  defined  as  the  maximum  of  degrees  of  consonance  DC(T  ,e),  of  terms  T  , 
in  the  formula  (i.e.,  as  defined  in  VL  ),  i.e.: 

DC(V,e)  =  MAX  {DC(T  ,e)} 

T.GV       i 

l 

(4)  Definition  of  the  degree  of  consonance  of  a  set  of  formulas  of  the  same  class. 

It  is  useful  sometimes  to  generate  more  than  one  formula  describing 
a  given  class  .    The  reason  is  that  having  more  than  one  formula  per  class  may 
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Generalized  selector  S:  [x  =1:3] 


Trimmed  selector 


S':  [x.=2] 


a.   Evaluation  function  for  an  interval 

selector  using  the  concept  of  'trimming' 


DC* 
1 
B 


0 


0 


Generalized  selector 
S:  [x.=l,2,4,5] 


Trimmed  selector 
S:  [x±=2,5] 


b.   Evaluation  function  for  a  nominal 

selector  using  the  concept  of  'trimming' 


Figure  6 


25 

improve  the  reliability  of  decision  making.   We  have  accepted  in  AQ11  the 
definition   that  the  degree  of  consonance  DC({V  },e),  of  a  set  of  formulas, 
{V  },  as  the  average  of  the  DC-s  of  the  formulas  in  the  set: 

DC({V.},e)  =  AVG  (DC(v.,e)} 

V.      1 
1 

Given  a  set  of  formulas  of  different  classes  and  an  event,  the  DC  is 
computed  between  the  formula  (or  a  set  of  formulas)  of  each  class  and  the  event. 
The  classification  decisions  are  ordered  according  to  the  value  of  DC.   All  de- 
cisions with  value  DC  within  the  distance  TAU  (see  parameter  TAU  in  section  2.5) 
from  the  maximum  value  of  DC,  are  rank  1  decisions  (i.e.,  each  of 
these  decisions  are  treated  as  equally  justified  ).   Then  the  next  'best' 
decision  which  is  not  of  rank  1  is  selected,  and  all  decisions  with  DC  within 
TAU  distance  from  DC  of  the  selected  decision  are  assigned  rank  2.   The 
process  repeats  IRK  times  (see  input  parameter  IRK  in  section  2.5). 

For  each  testing  event,  values  of  DC  of  ranked  decisions  are  printed 
by  the  program  AQ11   as  rows  in  an  'extended'  confusion  matrix  (see  Fig.  7 
for  an  example) .   In  the  matrix,  a  decision  of  rank  1  which  is  correct  is 
underlined,  and  the  number  of  rank  1  decisions  for  the  given  event  is  printed 
in  the  'TIES'  column.   If  an  event  has  some  unspecified  values,  it  may  still 
be  possible  to  compute  DC  for  certain  classes,  and  for  certain  classes  DC  would 
depend  on  the  value  of  unspecified  variables.  If  a  decision  of  rank  1 
is  correct  and  would  remain  so  no  matter  what  values  unspecified  variables 
take,  then  the  decision  is  treated  as  a  correct  decision.   In  any  other  case,  the 
decision  is  excluded  from  computing  total  correctness  statistics,  and  the  column 
'UNSP'  corresponding  to  the  given  event  has  entry   *  . 

The  performance  statistics  for  each  group  of  events  of  one  class  are 
printed  in  the  last  2  rows  of  the  group  of  rows  associated  with  these  events. 
The  rows  contain  the  number  and  percentage,  respectively,  of  events  classified 
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to  each  class.   The  percentage  Df  correct  decisions  is  braced  by 

vertical  bars.   The  matrix  also  contains  a  column  #DEC/event  specifying  an 

'indecision  ratio', which  is  a  ratio  of  all  decisions  of  rank  1  to  the  total 

number  of  events  in  the  group  (excluding  rows  with  USP=*) .   Figure  7  gives 

an  example  of  an  extended  confusion  matrix.   The  matrix  was  computed  for  3  classes 

Dl,  D2  and  D3  described  by  formulas: 

Dl:  [xl=2][x3=l][x4=l] 

Dl:  [xl=2][x2=0] 

D3:  [x2=l][x4=l] 
and  for  3  testing  events  of  class  Dl: 

el:   (1011) 

e2:   (2111) 

e3:   (*011)  (*  denotes  unspecified  value) 

The  parameters  were:  TAU=0.1  and  IRK=2: 


The  percentages 
braced  by 
vertical  bars 
indicate  the 
correct 
decisions. 


Assig 

ned  Dec 

ision 

Correct  Assign 

#DEC/Event 

TIE 

UNSP 

Dl 

D2 

D3 

Dl 

1-33 

.68 

.50 

.50 

2 

1.00 

.50 

1.00 

* 

0.50 

2 

0 

1 

|67%| 

0% 

33% 

An  example  of  an  extended  confusion  matrix 
Figure  7 

The  value  of  #DEC/Event  is  1.33  because  there  were  4  decisions  of  rank  1 
(including  new  with  UNSP=*)  and  3  events  (4/3=1.33). 

Concluding,  the  program  AQ11  permits  a  user  to  determine  decision 
formulas  in  an  incremental  way  and  then  automatically  test  them  on  the  testing 
data.   Thus,  it  is  a  'complete'  tool  for  making  experiments  in  inductive  learning 


of  DVL  descriptions  of  data. 
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2.  6  Program  User's  guide 

The  program  requires  two  sets  of  input  parameters:  control 
parameters  (read  in  PL/l  data  format)  and  data  parameters  (read  in  PL/l 
list  format).  All  but  two  control  parameters  have  default  values,  there- 
fore only  two  (NV,  NCL)  must  be  specified.  All  data  parameters  must  be 
specified  (although  some  may  be  omitted  if  certain  control  parameters 
are  set).   The  following  is  a  list  of  all  parameters  which  the  program 
currently  accepts.   Default  values  (if  any)  are  given  in  the  examples. 

2.6.1  Control  parameters 

•  NV  (no  default  value) 
Example:       NV  =  50 

Possible  values:   integer  in  the  range  1:50 

This  parameter  specifies  the  number  of  variables  which  are 
available  to  describe  each  event. 

•  NCL  (no  default  value) 
Example:       NCL  =  19 

Possible  values:   any  positive  integer 

NCL  specifies  the  number  of  classes  of  events  (or  event  sets) 
which  are  to  be  input  to  the  program.   The  program  will  then 
generate  a  cover  of  each  event  set. 

•  MAXSTAR 

Example:       MAXSTAR  =  10         (default  value) 

Possible  values:  any  positive  integer 

MAXSTAR  is  the  maximum  number  of  complexes  (terms)  which  are  kept 
in  any  intermediate  star  (see  AQ7  documentation  for  a  further 
description  [Larson,  Michalski  75]).   The  procedure  for  selecting 
'best  terms'  is  somewhat  different  in  this  program  than  in 
AQ7.   (When  each  intermediate  star  is  trimmed,  the  user-specified 
cost  function  is  used  rather  than  only  the  first  two  criteria). 
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•  NPASS 

Example:       NPASS  =  1  (default  value) 

Possible  values:   any  non-negative  integer 

If  NPASS  =  0,  the  program  will  only  execute  the  confusion 
matrix  phase  and  then  terminate  (Set  TEST  =  'l'B). 

If  NPASS  =  1,  the  program  will  form  a  cover  of  the  facts 
using  input  formulas  (if  any),  then  evaluate  the  formulas 
if  TEST  =  »1'B. 

If  NPASS  >  1,  the  program  will  partition  the  set  of  facts  into 
sets  whose  size  depends  on  the  PCT  parameter  (see  below)  and  form 
hypotheses  based  on  old  hypotheses  and  the  partitioned  facts. 
Then,  new  sets  of  events  will  be  taken  in  turn  and  hypotheses 
formed  based  on  the  entire  set  of  events  taken  up  to  the  point 
and  the  hypotheses  from  the  last  pass. 

•  TEST 

Example:       TEST  =  'O'B  (default  value) 

Possible  values:   'l'B  or  'O'B 

If  TEST  is  'l'B,  then  a  confusion  matrix  will  be  computed 

after  each  pass.   The  testing  events  must  be  given  to  the 
program  in  the  file  TESTF. 

•  RESTRICT 

Example:       RESTRICT  =  'O'B       (default  value) 
Possible  values:   'l'B  or  'O'B 

If  RESTRICT  is  'l'B,  then  a  set  of  restrictions  is  accepted 

by  the  program  (see  parameter  REST)  and  applied  to  all  events. 

•  RTEST 

Example:       RTEST  =  'O'B         (default  value) 
Possible  values:   'l'B  or  'O'B 

If  RTEST  is  'l'B,  the  restriction  will  also  be  applied 

to  testing  events. 
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TRANS 

Example:       TRANS  =  'O'B         (default  value) 

Possible  values:   'l'B  or  'O'B 

If  TRANS  is  set  to  'l'B,  then  the  variable  names  and  values 
are  translated  into  descriptive  names  in  the  output.  In  this 
case,  a  file  TRAN  must  be  given  to  the  program  (see  TRAN  below) 


•  PUNY 

Example:       PUNY  =0.02  (default  value) 

Possible  values:  real  value  in  interval  [0:1] 

All  terms  which  cover  less  than  a  percent  '  (PUNY*ioo)  of  the 
events  of  the  corresponding  set  will  be  discarded  in  the  next 
pass  (i.e.,  if  a  term  covers  2  events,  PUNY  =  1,  and  there  are 
23  events  in  the  training  set,  this  term  will  not  be  used  in  the 
next  pass). 

•  TAU 

Example:       TAU  =  .019  (default  value) 

Possible  values:   real  values  in  interval  T0:ll 

This  parameter  relates  to  the  computation  of  the  confusion 
matrix.   Any  two  values  (degrees  of  consonance)  within  TAU 
of  each  other  are  considered  to  be  of  the  same  rank.   For 
example,  if  .98  is  the  highest  decision  value  for  a  testing 
event,  any  decision  with  a  value  between  .96  and  .98  would 
be  a  rank  1  decision  (assuming  default  value  of  TAU) . 

•  IRK 

Example:       IRK  =  2  (default  value) 

Possible  values:  positive  integer  less  than  NCL 

This  parameter  also  relates  to  the  computation  of  the  confusion 
matrix  and  controls  the  number  of  decisions  printed  out.   All 
degrees  of  consonance  of  rank  not  greater  than  IRK  are  printed, 
others  are  not  printed.   If  IRK  =  1,  only  rank  1  degrees  are 
printed.   One  exception  to  this  is  that  the  degree  associated 
with  the  correct  decision  is  always  printed. 
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NCRIT 

Example:       NCRIT  =  2  (default  value) 

Possible  values:  integer  in  range  [1:4] 

NCRIT  specifies  the  number  of  cost  criteria  which  should  be  applied 
when  computing  the  cost  of  a  formula  (see  CRIT) .   CRIT(l)  through 
CRIT (NCRIT)  will  be  used,  all  others  will  be  ignored 


CRIT (I) 

Example:       CEIT(l)  =  1       CRIT(2)  =  2       (default  value) 

Possible  values:  each  CRIT(l)  may  have  values  1,  2,  3,   5  and  9 

CRIT (I)  =  J  specifies  that  the  I-th  criterion  in  order  will  be  the 

cost  function  J.   There  should  be  NCRIT  specifications  indi- 
cating the  cost  function  which  will  be  used  (j)  and  the  order 
in  which  they  will  be  applied  (i).  Available  cost  functions 
are  the  following: 

1.  Maximize  the  number  of  events  covered  by  the  given  term, 
and  not  covered  by  previous  terms 

2.  Minimize  number  of  selectors 

3.  Minimize  cost  of  all  variables  in  this  term.   If 
this  criterion  is  specified,  costs  of  variables 
must  also  be  specified  (see  Z  parameter) 

5.   Minimize  the  number  of  events  of  £0  covered 

9 .   Maximize  total  number  of  events  covered  by  a  term 

#EX 

Example:       #EX  =  1  (default  value) 

Possible  values :  positive  integer 

During  some  phase  of  the  program,  exception  terms  are 
formed  (description  of  events  which  are  covered  by  hypotheses 
but  should  not  have  been).  #EX  gives  the  number  of  redundant 
exception  terms  (i.e.,  the  terms  which  cover  the  same 
event ) . 
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•  TR 

Example:       TR  =  'O'B  (default  value) 

Possible  values:   '1'  or  '0'  B 

TR  gives  a  trace  of  the  multi-step  process  giving  the  exception 
terms  and  the  size  of  the  sets  F"1"  and  F~  described  in  Section  2.1. 

•  STGY 

Example:       STGY  =  1  (default  value) 

Possible  values:  1  or  2 

If  STGY  has  the  value  1,  then  exception  terms  are  formed 
for  events  in  the  sets  F~.   If  STGY  has  the  value  2,  then  the 
previous  hypotheses  are  multiplied  by  the  complement  of  the 
exception  events  of  the  set  F~. 

•  INDEP 

Example:       INDEP  =  'O'B         (default  value) 

Possible  values:   *0'B  or  'l'B 

If  INDEP  is  'l'B,  then  the  number  of  independently  covered 
events  are  printed  for  each  complex.  Otherwise,  only  the  num- 
ber of  new  events  and  the  total  number  of  events  covered  are 
printed. 

•  TITLE 

Example:       TITLE  =  0  (default  value) 

Possible  values:   non-negative  integer 

TITLE  specifies  the  number  of  cards  which  are  in  the  title. 
The  title  cards  must  follow  the  semi-colon  which  terminates 
the  set  of  control  parameters. 

•  OPT 

Example:  OPT  =  'l'B  (default  value) 

Possible  values:   'l'B  or  'O'B 

If  OPT  is  'l'B,  then  after  each  pass  a  table  is  printed  indicating  the 
numbers  of  times  each  cost  criterion  is  evaluated  (number  of 
terms  for  which  the  cost  function  is  evaluated) . 
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•  MODE 

Example:       MODE  =  'IC  (default  value) 

Possible  values:   'IC,  'DC*,  »VL' 

If  MODE  =  'IC,  then  covers  are  allowed  to  intersect  over  'DON'T 
CARE'areas  of  the  event  space.   If  MODE  =  'IC,  the  covers  are 
constrained  to  be  disjoint.   MODE  =  *VL'  gives  order  dependent 
covers. 

•  CPXEV 

Example:       CPXEV  =  'l'B         (default  value) 

Possible  values:   'l'B  or  'O'B 

If  this  parameter  is  'l'B,  then  during  the  testing  phase  a 
table  is  printed  which  gives  the  number  of  times  each  term 
was  needed  to  give  a  correct  decision. 

•  GEN 

Example:       GEN  =  'l'B  (default  value) 

Possible  values:   'l'B  or  'O'B 

If  this  parameter  is  'l'B,  then  only  the  necessary  parts  of 
the  reference  of  each  output  complex  are  printed  (i.e.,  a  new 
term  is  created  from  the  generated  term  which  has  the  following 
properties) : 

a.  The  new  term  covers  the  same  events. 

b.  The  new  term  contains  the  same  variables. 

c.  The  references  in  the  new  term  are  as  small  as  possible. 

•  ECHO 

Example:       ECHO  =  'ERZ'         (default  value) 

Possible  values:  A  string  contains  any  of  the  characters  ZERF 

If  the  letter  appears,  the  corresponding  input  data  is  echoed. 

E  =  Events 
R  =  Restrictions 
Z  =  Variable  costs 
F  =  Input  formulas 

The  default  echos  events,  restrictions  and  variable  costs  if 
they  are  in  the  input. 
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•  TOLERANCE (I) 

Example:       TOLERANCE(2)  =0.0    (default  value) 

Possible  values:   integer  or  real  in  [0:1] 

TOLERANCE (J)  is  the  tolerance  for  the  J-th  criterion  specified. 
If  it  is  an  integer,  then  it  is  assumed  to  be  an  absolute  tolerance, 
Otherwise,  it  is  a  relative  tolerance  calculated  by  finding 
TOLERANCE  *  (MAX-MIN)  when  MAX  or  MIN  are  the  maximum  and  minimum 
elements  in  the  list  of  costs  to  be  sorted. 

•  ORD 

Example:       ORD  =  'l'B  (default  value) 

Possible  values:   'l'B,  'O'B 

If  ORD  is  'l'B,  then  the  program  will  reorder  events  in  EO, 
in  decreasing  order,  with  regard  to  the  distance  from  e..  . 

•  N-TAU 

Example:       N-TAU  =  0  (default  value) 

Possible  values:   integer  in  [0:8] 

This  parameter,  if  not  zero,  generates  a  TAU  estimation  table 
giving  summary  information  for  each  class  in  the  evaluation 
procedure  using  N-TAU  values  of  TAU  beginning  at  0  with  increments 
of  TAU- INC. 

•  TAU- INC 

Example:       TAU- INC  =  .02         (default  value) 
Possible  values:   Real  in  [0:1] 

This  is  the  increment  used  in  the  TAU  estimation  table. 
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Semi  colon  (;):   This  must  be  entered  to  terminate  the  control 


parameters. 


2.6.2  Data  parameters 

These  parameters  have  the  names  as  used  in  the  program.   In  the 
input  to  the  program  only  their  values  are  specified,  in  the  order 
given  here  (See  fig.  B-l  (a)  for  an  example.) 

§   TITLEC 

Possible  values:   The  number  of  lines  specified  by  the  TITLE 

parameter 

These  lines  are  printed  at  the  top  of  the  output. 


•  NSPEC 

Possible  values:  An  integer  in  the  range  [0:NV] 

Number  of  variables  for  which  a  structure  is  to  be  specified. 

•  VTYPE 

Possible  values:   'F',  'I' 

The  NSPEC  variables  will  be  of  this  type  ('F1  -  nominal 
variable,  'I'  -linear  variable). 

•  TYPE 

Possible  values:   A  list  of  NSPEC  integers  in  the  range  [1:NV] 
The  list  indicates  variables  of  VTYPE. 

Example  of  NSPEC,  VTYPE,  TYPE:     3'F'  1  3  5 

There  are  3  variables  of  type  'F'  (nominal)  namely, 
variables  1,  3,  and  5.   The  rest  will  have  type  'I'. 

•  NL 

Possible  values:  A  list  of  NV  positive  integers  in  the  range  [1:8] 

This  parameter  gives  the  number  of  values  which  each  variable 
can  assume. 

Example:       12  4 
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•  NE 

Example :       3  1^+ 

Possible  values:  A  list  of  NCL  integers  in  the  range  [0:NEVE] 

The  parameter  specifies  the  number  of  events  in  each  event 
set.   The  sum  should  add  up  to  NEVE. 

•  NF 

Example :       3^1 

Possible  values:  A  list  of  NCL  non-negative  integers 

This  parameter  specifies  the  number  of  terms  of  the  hypothesis 
for  each  event  set. 

•  PCT 

Example:       .2  .k     1 

Possible  values:  A  list  of  NPASS  real  values  in  range  [0:1] 

(except  if  NPASS  =  1,  PCT  is  assumed  to  be  l) 

In  this  example,  20$>  of  the  events  will  be  described  first, 
then  an  extra  20$,  of  the  events  will  be  added  and  a  description 
formed  using  previous  hypotheses.   Finally,  the  complete  set  of 
events  is  used  (see  NPASS  above). 

•  REST 

Example:       (xl2  =  l)->  (xlk   =  *); 

(xl3  =  2)->  (xl  =  *)  (xh  =   1). 

Possible  values:  A  list  of  decision  rules  separated  with  semi- 
colons and  terminated  with  a  period 

This  restriction  will  be  applied  to  all  events  (i.e.,  added  to 
current  specifications).   RESTRICT  must  be  set  to  specify  restric- 
tions. An  *  in  the  reference  indicates  that  this  variable  is  not 
applicable.  Restrictions  are  separated  by  semi-colons  and  the 
list  of  all  restrictions  is  terminated  by  a  period. 
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•  EVENT 

Possible  values:  NEVE  lists  of  events,  NEVE  =  SUM(NE) 

There  are  two  ways  in  which  events  can  be  specified,  and  the 
two  types  of  specifications  can  be  mixed. 

1.  An  event  can  be  specified  as  a  list  of  values,  one  value  for 
each  variable.   The  values  can  be: 

a)  non-negative  integer — indicating  value  of  the  variable 

b)  -1 — variable  does  not  apply 

c)  -2 — do  not  know  the  value 

Example:       3  2  0-1-20 
k     1  2   0   0  0 

2.  An  event  can  also  be  specified  by  a  VLl  formula  which  is 

preceeded  by  a  line  which  says  FORMULA.   Each  formula 
must  be  terminated  by  a  semi-colon. 

Example :       FORMULA. 

(xl  =  2)  (x3  =  0); 

FORMULA. 

(x3  =  1)  (x21  =2); 

•  FORMULA 

Possible  values:  NCL  lists  of  formulas,  each  having  NF  complexes 
There  are  two  ways  to  specify  a  formula: 

1.  as  a  FORMULA  as  in  the  event  specification, 

2.  as  a  binary  positional  bit  string  in  PL/1  List  Format. 

•  Z 

Example:       Z(l,2)  =  9       Z(3,*0  =  57; 

Possible  values:  Integer  values  terminated  by  semi-colon  in 
PL/1  Data  Format 

These  are  costs  of  the  variables  which  are  accepted  if  CRIT(I)  =  3 
has  been  specified  for  the  event  set  I.   If  Z  value  is  not  specified 
for  some  variables,  it  is  assumed  to  be  1.   Z(I,J)  =  Y  means  that 
variable  x  has  cost  Y  for  event  set  I. 
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2.6.3  Files 

•  TEST 


This  file  must  be  included  if  the  parameter  TEST  is  set  to  'l'B, 
The  first  line  of  this  file  contains  a  list  of  NCL  values 
indicating  the  number  of  test  events  for  each  event  set. 
The  list  of  testing  events  follows.   Each  event  is  specified 
as  a  list  of  variable  values  with  coding  of  -1  and  -2  as 
above. 


•  TRAN 

This  file  must  be  included  if  TRANS  is  'l'B.  It  contains  the 
names  of  all  variables  and  variable  values.   Each  name  will  be 
truncated:   variable  names  to  20  characters,  value  names  to  10 
characters.   The  format  is  the  following:   For  each  variable 
one  specifies: 

variable  name,  variable  value  names 

Each  name  must  be  in  single  quotes. 

Example : 

//TRAN  DD  * 

•TEMPERATURE' 

'COLD1 

'MODERATE' 
•WARM' 

•HUMIDITY' 

•DAMP'   'DRY1 

•  TESTF 

This  is  a  temporary  file  which  the  program  uses  to  store  test 
events  from  one  pass  to  the  next.   See  JCL  set  up  for  specifi- 
cation of  this  file. 

This  completes  a  description  of  the  input  specification  to  the  pro- 
gram AQ11.   For  a  user's  convenience,  appendix  A  gives  a  summary  of  the  input 
specification.   Appendix  B  gives  an  example  of  input  and  output  from  the 
program. 
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2.6.4   Program  Output 

Most  of  the  output  is  self-explanatory  (see  appendix  B) .   The  input 
data  is  echoed  when  specified.   Then,  the  formulas  for  each  pass  are  printed. 
To  the  right  of  each  term  is  a  pair  of  numbers  which  specify  the  number 
of  new  events  covered  and  the  total  number  of  events  covered  by  that  term. 

After  all  the  formulas  for  one  pass  are  printed,  a  confusion  matrix 
is  printed  for  these  formulas  and  given  testing  data.  Information  about  each 
pass  is  printed  in  turn  until  all  passes  are  complete. 

If  two  events  of  different  classes  are  identical,  then  a  message 
is  printed  indicating  a  non-disjoint  representation  of  classes.   In  such  a 
situation,  if  a  cover  C(E1/E0)  is  being  created,  then  the  event  of  EO  is 
ignored . 

The  output  from  the  evaluation  part  of  the  program  consists  of  an 
extended  confusion  matrix,  as  described  in  section  2.5. 

Two  other  tables  are  printed  at  the  user's  option.   If  CPXEU  is 
set,  then  a  table  listing  the  number  of  correct  decisions  for  each  complex 
is  given.   If  N-TAU  is  not  zero,  then  TAU  estimation  table  is  printed,  giving  the 
indecision    ratio  and  number  of  correct  decisions  for  each  class  for  N-TAU 
values  of  TAU  ,  beginning  with  0  in  increments  of  TAU-INC. 

3 .   SUMMARY 

We  have  described  here  the  underlying  methodology  and  computer  programs 
for  selecting  'best'  learning  VL,  events  (program  ESEL) ,  and  incrementally 
generating  VL..  hypotheses  for  given  event  sets  (e.g.,  selected  by  program  ESEL), 
and  then  automatically  testing  them  on  the  supplied  testing  events  (program  AQ11) 
These  two  programs  constitute  a  package  which  can  be  used  for  making 
experiments  in  induction  of  descriptions  from  examples  in  various  applied  fields. 
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APPENDIX  A 


AQ11  Input  Specifications 


1.   ID  parameters 

Allow  150  to  180  K  Bytes  of  storage  for  large  problems.   A  very  small 
problem  may  be  run  in  120  K.  Very  few  IOREQ's  are  used  by  the  program; 
500K  ismore  than  enough.   Time  is  the  main  variable  which  must  be  ad- 
justed.  Using  the  following  parameters,  an  estimate  of  the  time  re- 
quired for  a  large  job  can  be  given. 

MAXSTAR  =  1  NPASS  =3  NV  =  35 

NCL  =  19  NEVE (training)  =  307  No  evaluation 

Time:   1  min.  Region:   174K 

Changing  MAXSTAR  to  7  and  requesting  evaluation  using  388  events,  the 
time  increased  to  3  minutes  for  the  training  phase  and  1  minute  and  30 
seconds  for  evaluation. 


JCL 

The  following  JCL  is  recommended: 

//   EXEC  PGM=ITCN3,REGION=180K,PARM=' ISA (N), REPORT' 

//STEPLIB  DD  DSN=USER.P2123.ITCN3,DISP=SHR 

//SYSPRINT  DD  SYS0UT=A 

//PLIDUMP  DD  SYS0UT=A 

//SAVEF  DD  DSN=&&TEMP,UNIT=DISK,SPACE=(TRK,(10,1)) 

//FT06F001  DD  SYS0UT=A 

//SYSIN  DD  * 

input  parameters  and  data 

//TEST  DD  * 

test  data  (if  evaluation  requested) 

//TRAN  DD  * 

translation  data  (if  TRANS  is  set) 

ISA(N):   N  should  be  the  region  requested  minus  125, 
e.g.  51K 
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Control  parameters 

Parameter 

NV 

NEVE 

NCL 

MODE 

MAXSTAR 

ECHO 

NCRIT 

CRIT(l) 

CRIT(2) 

CRIT(3) 
CRIT(A) 

TITLE 

RESTRICT 

SAVE 

GEN 

PUNY 

TR 

NPASS 
STGY 

#EX 

OPT 

TRANS 
TEST 
RTEST 
TAU 

IRK 

CPXEV 

NGE 

INDEP 

TOLERANCE (I) 

N-TAU 

TAU- INC 
ORD 

Semi-colon  (;) 
Data  parameters 
Parameter 

TITLE 
NSPEC 

TYPE 


Default 


rIC« 
10 

'ERZ' 
2 
1 
2 

5 
9 

0 

'O'B 

•O'B 

•l'B 

.02 

'O'B 
1 
1 

1 

'l'B 

'O'B 
'O'B 
'O'B 
.019 

2 

'l'B 

200 
'O'B 
0 


.02 
'l'B 


Description 

Number  of  variables 

Total  number  of  training  events 

Number  of  classes 

Mode  of  operation 

Maximum  star  size 

Echo  input 

Number  of  criteria 

Criterion  1 

Criterion  2 

Criterion  3 

Criterion  4 

Number  of  lines  in  title 

Accept  restrictions 

Save  formulas  in  a  file 

SAVEF 
Trim  eomplexes  for  output  and 

evaluation 
The  minimum  percent  of  events  which  h 
to  be. covered  by  a  term 
Trace  multi-step  procedure 
Number  of  steps 
Way  in  which  events  of  F  sets 

are  handled 
Numbers  of  redundant  exception 

complexes 
Print  statistics  about  number  of  times 

each  cost  function  is  evaluated 
Translate  output  using  TRAN  file 
Evaluate  formulas 
Apply  restrictions  to  test  events 
Equivalent  threshold  for  rank  1 

decisions 
Number  of  ranked  decisions  which 

are  printed 
Print  statistics  about  satisfied 

complexes  during  evaluation 
Initial  storage  for  complexes 
Prints  independent  events  if  set 
Tolerance  for  Itn  specified  test 
function 

Number  of  columns  in  'tau'  estimation 
table 

Increment  in  tau  estimation  table 
Reorder  the  events  in  EO,  in 
decreasing  order,  with  regard 
to  the  distance  from  e  . 
Terminate  control  parameters 


Description 

Lines  of  title  (if  any) 

Number  of  variables  for  which  type 

TYPE  is  specified 
Type  of  these  variables 


43 


VSPEC 
PCT 


NL 
NE 
NF 
RESTRICTIONS 


EVENTS 

FORMULAS 

Z 

5.   Files 
File 


Indicies  of  variables  of  type  TYPE 
If  NPASS  >  1,  the  percent  of  events 

to  use  in  learning  phase  for  each 

pass 
Number  of  values  for  each  variable 
Number  of  events  in  each  set 
Number  of  formulas  in  each  set 
If  RESTRICT  is  set.   Each  pair  of 

rules  must  be  separated  with  a 

semi- colon;  the  entire  list  is 

terminated  with  a  period. 
Lists  of  events  in  either  of  two 

forms 
Lists  of  formulas  as  in  either  of 

two  forms 
If  any  CRIT(l)  =  3,  costs  of  variables 

terminated  with  semi-colon 


Description 


TRAN 

TEST 
SAVEF 


TRANS  is  'l'B,  the  file  names  of 

classes,  variables  and  variable 

values.   Each  name  must  be  in 

quotes 
TEST  is  'l'B,  the  file  of  test 

events 
SAVE  is  'l'B,  the  output  file  of 

formulas  in  bit  positioned  form 

(list  format) 
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APPENDIX  B 

An  Example  of  an  Input  to 
and  an  Output  from  AQ11 

This  appendix  contains  an  example  of  the  program  input  and  output 
which  involves  most  of  the  features  of  the  program.   Figure  B-l  gives  the 
input  specification  for  this  example.   Figure  B-2  gives  the  output  which  was 
obtained.   The  first  page  of  output  repeats  the  input  in  a  slightly  extended 
form.   The  next  pages  show  formulas  which  were  generated  [in  which  variables 
x1,x2,x3,x^  are  substituted  by  their  names,  and  defined  in  the  input  (item  P  in 
Fig.  B-l)]  and  the  results  of  the  evaluation  of  formulas  on  testing  events. 

Explanation  of  Figure  B-l. 

The  example  involves  four  variables  (NV=4;  see  item  B  in  Figure  B-l(a)), 

which  can  take  2,  3,  4  and  2  values,  respectively  (item  E) .   All  variables  are 

nominal,  except  variable  x  which  is  interval  (item  D) .   There  are  2  classes 

(NCL=2;  item  B) ,  each  represented  by  6  learning  events  (items  F,  J).   The 

last  event  of  set  (class)  1  is  specified  as  a  DVL..  formula  (in  the  middle  of 

item  J) .   Item  H  defines  the  percentage  of  learning  events  to  be  used  in  each 

iteration  (pass).   The  restriction  on  event  space  is  given  by  a  VL,  decision 

rule  (item  I) .  There  are  0  initial  hypotheses  for  class  1  and  2  hypotheses  for 

class  2  (item  G) .   Item  K  (fig.  B-l(b))  lists  the  hypotheses  for  class  2,   The  cost  of 

variable  1  for  set  1  is  specified  as  2  (item  L) •  the  cost  criteria  for  the  selection  of 
complexes  (terms)  in  the  synthesis  of  covers  are  in  the  order  1,  2,  3,  9 

(1  and  2  by  default;  3  and  9  defined  by  CRIT(3)=3,  CRIT(4)=9  in  item  B) .   (For 

the  definition  of  cost  of  variables  and  cost  criteria  see  [Larson,  Michalski  75]). 

Evaluation  of  the  formulas  to  be  generated  is  requested  (//TEST  DD*) 

and  sets  of  test  events  supplied,  4  events  per  class  (items  M,  N) .   A  file 

containing  names  of  each  class  (set),  each  variable  and  each  value  of  the  variable 

is  also  supplied  (items  0,  P) . 
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Input  for  Example 


A 


//  EXEC  PGM=ITCN3,REGION=160K,PARM«'ISA(23K), REPORT' 

//STEPLIB  DD  DSN=USER.P0012.ITCN3,UNIT=DISK,DISP=SHR 

//SYSPRINT  DD  SYSOUT=A 

//PLIDUMP  DD  SYSOUT=A 

//SAVEF  DD  SYSOUT=B 

//FT06F001  DD  SYSOUT=A 

//TEST  DD  DSN=JIM, UNIT=DISK,DISP= (NEW, DELETE), SPACE=(TRK, (10,10)) 

//SYSIN  DD  * 
NPASS=2         NV=4       MAXSTAR=30     CRIT(3)=3    CRIT(4)=9 
TITLE=3  NEVE=12    ECHO='ERFZ'     TEST='1'B    TRANS-1 l'B 

RESTRICT='1'B    NCL=2      INDEP=,1,B     NGE=100      NCRIT=4 
NTAU=4; 


************************************************************ 

TEST  RUN 
************************************************************************ 


i      »r 

3 

2      3  4 

2 

6      6 

0     2 

.5   1 

(X1=0) 

(X2=0) 

(X3=0)    -> 

r0     0  0 

0 

0     0  2 

0 

0      2  0 

1 

Oil 

0 

0     2  2 

1 

FORMULA 

(X4=0) 

(X2=l 

2)    (X1=0) 

0      2  1 

1 

0     0  3 

0 

12  0 

0 

111 

0 

10  2 

1 

12   3 

0 

(X4=0) . 


A  JCL 

B  Control  parameters 

C  Title 

D  NSPEC,  UTYPE 

E  Number  of  levels/variable 

F  Number  of  events/pass 

G  Number  of  formulas/set 

H  Fraction  of  events/pass 

I  Restriction 

J  Event  list  (6  events/set) 


v 


Figure  B-l  (a) 
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M 


( 

r 


< 


( 


r 


FORMULA 

(Xl=l)  (X2=l  2) 

FORMULA 

(Xl=l)  (X3=2  3) 

z(l,l)=2; 

//TEST  DD  * 
4  4 

0  0   10 
0   110 
0  13  0 
0   0  11 
110  1 
12   3  1 
0  13  0 
0  13  1 
//TRAN  DD  * 
'ACCEPT' 
'REJECT' 
'NEW' 


'COLOR1 


'SIZE' 


'WEIGTH' 


(X4=0)  (X3=0  1); 


'YES' 
'NO' 

'RED' 
'BLUE1 
' ORANGE ' 

'SMALL' 
'MEDIUM' 
' LARGE ' 
'X  LARGE' 

'HEAVY' 
'LIGHT' 


variable  xl 
)  values 

variable  x2 

)  values 

variable  x3 

values 

variable  x4 
j   values 


K  Formulas  (2  guesses  for  set  2) 
L  Cost  of  variables  (variable  1 

has  cost  2  for  set  1) 
M  Number  of  test  events/set 
N  List  of  test  events 
0  Name  of  each  set 
P  Variable  names  and  variable 

value  names 


Figure  B-l  (b) 


4? 

Explanation  of  Figure  B-2. 

The  first  part  contains  an  echo  of  the  input  (item  A) .   Next  (item  B) 

prints  the  formulas  obtained  after  the  first  iteration  (pass) ,  which  used 

50%  of  the  input  events  (first  3  events  in  both  classes;  see  item  H  in 

Fig.  B-l) .   The  classes,  variables  and  values  of  variables  are  specified  by 

names.   Together  with  each  complex  (term)  a  triple  of  numbers  is  printed 

(NEW,  IND,  COV)   (item  C) ,  where 

NEW  -  denotes  the  number  of  events  covered  by  the  given  complex  and  not 

covered  by  the  previous  complexes  on  the  list  of  complexes  generated  for  this  class 

IND  -  denotes  the  number  of  events  covered  only  by  the  given  complex 

COV  -  the  total  number  of  events  covered  by  the  given  complex. 

The  program  also  lists  the  number  of  times  each  cost  criterion  has 
been  evaluated  (item  D) .   Item  E  gives  a  symbolic  specification  of  the  obtained 
formulas.   Next,  an  extended  confusion  matrix  is  printed  (item  F)  as  the  result 
of  evaluating  the  obtained  formulas  for  the  testing  events  (item  N  in  Fig.  B-l). 
We  can  see  from  the  matrix  that  all  testing  events  of  the  first  class  ('ACCEPT') 
have  been  misclassif ied,  and  all  the  events  of  the  second  class  ('REJECT')  have 
been  correctly  classified. 

Item  G  specifies  the  number  of  times  each  complex  in  the  cover  of 
each  class  has  been  satisfied  by  testing  events  in  the  case  of  correct  decisions 
(second  complex,  C2,  of  class  D2  correctly  classified  3  testing  events,  and  the 
third  one,   C3,  correctly  classified  1  testing  event). 

Item  H  specifies  the  percentage  of  correct  decisions  and  the  indecision 
ratio   for  various  values  of  parameter  TAU  (generally,  the  higher  TAU,  the 
greater  is  the  number  of  correct  decisions,  but  also  the  greater  is  the  indecision 
ratio) . 

Item  I  lists  the  formulas  obtained  in  the  second  iteration  (which 
used  all  the  learning  events),  and  item  J  -  the  corresponding  confusion  matrix. 
We  can  see  that  this  time  50%  of  testing  events  of  class  1,  and  100%  of 
class  2  were  correctly  classified.   Items  K  and  L  give  the  same  information 
as  items  G  and  H,  respectively,  but  for  the  formulas  obtained  in  the  second  iteration. 
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*************************************************************************** 

TEST  RUN 
*************************************************************************** 


TR= ' 0 ' B 

NPASS=2 

NV=4 

MAXSTAR=30 

TITLE=3 

NCL=2 

TEST='1,B 

MODE= ' IC  * 

STGY=1 

INDEP='1'B 

GEN='1'B 

ECHO='ERFZ' 

NGE=100 

TAU=1.89999E-02 

IRK=2 

CPXEV='1'B 

TRANS='1'B 

NTAU=4 

TAU  INC=1.99999E-02 

ORD-'l'B'; 

CRIT  LIST 

1  0.00  2  0.00  3  0.00 

9 

0.00 

113 

2  PASSES   0.50  1.00 

NUMBER  OF 

LEVELS  /  VARIABLE   2  3  4 

2 

#EX=1 

NUMBER  OF 

EVENTS  /  CLASS    6   6 

SAVE='0*B 

NUMBER  OF 

FORMUAS  /  CLASS    0   2 

PUNY=1.99999E-02 
RESTRICT= ' 1 ' B 
RTEST='0'B 

RESTRICTIONS  ON  EVENT  SPACE 

(X1=0)  (X2=0)  (X3=0)  ->  (X4=0) 

LIST  OF  INPUT  EVENTS 


1 

0  0 

0 

0 

2 

0  0 

2 

0 

3 

0  2 

0 

1 

4 

0  1 

1 

0 

5 

0  2 

2 

1 

(X4= 

=0] 

(X2=l 

,2) 

(X1=0) 

(X3= 

=D; 

7 

0  2 

1 

1 

8 

0  0 

3 

0 

9 

1  2 

0 

0 

10 

1  1 

1 

0 

11 

1  0 

2 

1 

12 

1  2 

3 

0 

INPUT  FORMULAS 

(Xl=l)  (X2=l,2)  (X4=0)  (X3=0,l); 
(Xl=l)  (X3=2,3); 

COSTS  OF  VARIABLES  WHICH  ARE  NOT  1:  Z(l,  1)=      2; 

TIME  FOR  INPUT  OF  DATA   36  CENTISECONDS 


C      NEW  IND  COV 


C     *****C0VER  OF  ACCEPT***** 

CPX  1:  (NEW=  YES)  (SIZE=  SMALL) 
CPX  2:  (NEW=  YES)  (SIZE=  LARGE) 

*****COVER  OF  REJECT***** 


CPX  1 
CPX  2 
CPX   3 


(SIZE=  MEDIUM) 
(SIZE=  X  LARGE) 
(NEW=  NO) 


#  TIMES  EV. 

12 

11 

4 

4 

19  CENTISECONDS 


Figure  B-2  (a) 


2) 
1) 


( 

1 

1 

1) 

( 

1 

1 

1) 

( 

1 

1 

1) 
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FORMULAS  FOR  CLASS   1 
CPX   :(X1  =  0)  (X3  =  0) 
CPX   :(X1  =  0)  (X3  =  2) 

FORMULAS  FOR  CLASS    2 


CPX 
CPX 
CPX 


(X3  =  1) 
(X3  =  3) 
(XI  =  1) 


NUMBER  OF  EVENTS  IN  EACH  CLASS 


ASSIGNED  DECISION 


CORRECT  //  EVENTS/  TIE  UNSP  1  1  D  1  D  2 
ASSIGN   #  RK1  DEC         |  | 

D  1  ACCEPT                   .50  1.00 

.50  1.00 
.50  1.00 
.50  1.00 

4/4               |  0  |   4 
1.00                0%  100% 

D  2  REJECT                   .50  1.00 

1.00 
.50  1.00 
.50  1^00 

4/4                0   |  4  | 
1.00                0%  100% 

NUMBER  OF  CORRECT  DECISIONS /COMPLEX 

COMPLEXES 
EVENT  SETS     C1C2C3C4C5C6C7C8C9C10  Cll  C12  C13 

1  3        1 

2 


Figure  B-2    (b) 


H 


CLASS 

1 
2 

TOTALS 


TAU  ESTIMATION  TABLE   (%  CORRECT  /  INDECISION  RATIO) 

VALUE  OF  TAU 
0.00      0.02      0.04      0.06 
0.00/1.00  0.00/1.00  0.00/1.00  0.00/1.00 
1.00/1.00  1.00/1.00  1.00/1.00  1.00/1.00 

0.50/1.00  0.50/1.00  0.50/1.00  0.50/1.00 

FINAL  STATISTICS 
INDECISION  RATIO:   1.00 
PERCENT  CORRECT:   50.00 

TIME  TO  EVALUATE  FORMULAS       35  CENTISECONDS 


CPX  1:(NEW=  YES)  (SIZE=  SMALL  MEDIUM)  (WEIGHT=  HEAVY) 

CPX  2:(NEW=  YES)  (SIZE=  LARGE) 

CPX  3:(NEW=  YES)  (SIZE=  SMALL) 

*****COVER  OF  REJECT***** 

CPX  1:(SIZE=  MEDIUM)  (WE IGHT=  LIGHT) 

CPX  2:(SIZE=  X  LARGE) 

CPX  3:(NEW=  NO) 


CRIT  # 
1 
2 
3 
9 


=  HEAVY) 

( 

3 

2 

3) 

( 

2 

2 

2) 

( 

1 

1 

2) 

( 

1 

1 

1) 

( 

2 

1 

2) 

( 

3 

3 

4) 

//  TIMES   EV. 
8 
7 
4 
3 

TIME  FOR  THIS  PASS 


20  CENTISECONDS 


r 


ASSIGNED  DECISION 

CORRECT  //  EVENTS/   TIE  UNSP  |  I  D  1   D  2 
ASSIGN   //  RK1  DEC         I  I 


D   1  ACCEPT 


4/   4 
1.00 
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